Linked Bernoulli Synopses: Sampling along Foreign Keys

نویسندگان

  • Rainer Gemulla
  • Philipp Rösch
  • Wolfgang Lehner
چکیده

Random sampling is a popular technique for providing fast approximate query answers, especially in data warehouse environments. Compared to other types of synopses, random sampling bears the advantage of retaining the dataset’s dimensionality; it also associates probabilistic error bounds with the query results. Most of the available sampling techniques focus on table-level sampling, that is, they produce a sample of only a single database table. Queries that contain joins over multiple tables cannot be answered with such samples because join results on random samples are often small and skewed. On the contrary, schema-level sampling techniques by design support queries containing joins. In this paper, we introduce Linked Bernoulli Synopses, a schemalevel sampling scheme based upon the well-known Join Synopses. Both schemes rely on the idea of maintaining foreign-key integrity in the synopses; they are therefore suited to process queries containing arbitrary foreign-key joins. In contrast to Join Synopses, however, Linked Bernoulli Synopses correlate the sampling processes of the different tables in the database so as to minimize the space overhead, without destroying the uniformity of the individual samples. We also discuss how to compute Linked Bernoulli Synopses which maximize the effective sampling fraction for a given memory budget. The computation of the optimum solution is often computationally prohibitive so that approximate solutions are needed. We propose a simple heuristic approach which is fast and seems to produce close-to-optimum results in practice. We conclude the paper with an evaluation of our methods on both synthetic and realworld datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Optimization of Dynamic Metric Access Methods Using an Algorithm of Effective Deletion

New Challenges in Petascale Scientific Databases p. 1 Adventures in the Blogosphere p. 2 The Evolution of Vertical Database Architectures A Historical Review p. 3 Query Optimization in Scientific Databases Linked Bernoulli Synopses: Sampling along Foreign Keys p. 6 Query Planning for Searching Inter-dependent Deep-Web Databases p. 24 Summarizing Two-Dimensional Data with Skyline-Based Statistic...

متن کامل

Hierarchical Group-Based Sampling

Approximate query processing is an adequate technique to reduce response times and system load in cases where approximate results suffice. In database literature, sampling has been proposed to evaluate queries approximately by using only a subset of the original data. Unfortunately, most of these methods consider either only certain problems arising due to the use of samples in databases (e.g. ...

متن کامل

Thompson Sampling in Switching Environments with Bayesian Online Change Point Detection

Thompson Sampling has recently been shown to achieve the lower bound on regret in the Bernoulli Multi-Armed Bandit setting. This bandit problem assumes stationary distributions for the rewards. It is often unrealistic to model the real world as a stationary distribution. In this paper we derive and evaluate algorithms using Thompson Sampling for a Switching Multi-Armed Bandit Problem. We propos...

متن کامل

Schema Evolution and Foreign Keys: Birth, Eviction, Change and Absence

In this paper, we focus on the study of the evolution of foreign keys in the broader context of schema evolution for relational databases. Specifically, we study the schema histories of a six free, opensource databases that contained foreign keys. Our findings concerning the growth of tables verify previous results that schemata grow in the long run in terms of tables. Moreover, we have come to...

متن کامل

Finite Satis ability of Keys and Foreign Keys for XML

Key and foreign key constraints are useful for XML 5] data in semantic speciication, query optimization and more importantly, for information preservation in data exchange. Several XML proposals, e.g., XML Schema 28] and XML Data 21], support key and foreign key speciications. These constraints, however, may not be nitely satissable in the XML context. More specifically , given a DTD D and a ni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008